Search CORE

168 research outputs found

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

Author: Liu Boyi
Liu Ming
Wang Lujia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/05/2019
Field of study

This paper was motivated by the problem of how to make robots fuse and transfer their experience so that they can effectively use prior knowledge and quickly adapt to new environments. To address the problem, we present a learning architecture for navigation in cloud robotic systems: Lifelong Federated Reinforcement Learning (LFRL). In the work, We propose a knowledge fusion algorithm for upgrading a shared model deployed on the cloud. Then, effective transfer learning methods in LFRL are introduced. LFRL is consistent with human cognitive science and fits well in cloud robotic systems. Experiments show that LFRL greatly improves the efficiency of reinforcement learning for robot navigation. The cloud robotic system deployment also shows that LFRL is capable of fusing prior knowledge. In addition, we release a cloud robotic navigation-learning website based on LFRL

arXiv.org e-Print Archive

Crossref

Agricultural Robot for Intelligent Detection of Pyralidae Insects

Author: Hu Zhuhua
Liu Boyi
Zhao Yaochi
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

The Pyralidae insects are one of the main pests in economic crops. However, the manual detection and identification of Pyralidae insects are labor intensive and inefficient, and subjective factors can influence recognition accuracy. To address these shortcomings, an insect monitoring robot and a new method to recognize the Pyralidae insects are presented in this chapter. Firstly, the robot gets images by performing a fixed action and detects whether there are Pyralidae insects in the images. The recognition method obtains the total probability image by using reverse mapping of histogram and multi-template images, and then image contour can be extracted quickly and accurately by using constraint Otsu. Finally, according to the Hu moment characters, perimeter, and area characters, the contours can be filtrated, and recognition results with triangle mark can be obtained. According to the recognition results, the speed of the robot car and mechanical arm can be adjusted adaptively. The theoretical analysis and experimental results show that the proposed scheme has high timeliness and high recognition accuracy in the natural planting scene

IntechOpen

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

Author: Cai Qi
Liu Boyi
Wang Zhaoran
Yang Zhuoran
Publication venue
Publication date: 11/09/2019
Field of study

Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio

arXiv.org e-Print Archive

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Author: Liu Boyi
Wang Zhaoran
Zhang Shenao
Zhao Tuo
Publication venue
Publication date: 30/10/2023
Field of study

ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.Comment: Published at NeurIPS 202

arXiv.org e-Print Archive

p38MAPK plays a pivotal role in the development of acute respiratory distress syndrome

Author: Fang Zhicheng
Feng Ying
Liu Boyi
Zheng Xiang
Publication venue: Hospital das Clínicas, Faculdade de Medicina, Universidade de São Paulo
Publication date: 01/01/2019
Field of study

Acute respiratory distress syndrome (ARDS) is a life-threatening illness characterized by a complex pathophysiology, involving not only the respiratory system but also nonpulmonary distal organs. Although advances in the management of ARDS have led to a distinct improvement in ARDS-related mortality, ARDS is still a lifethreatening respiratory condition with long-term consequences. A better understanding of the pathophysiology of this condition will allow us to create a personalized treatment strategy for improving clinical outcomes. In this article, we present a general overview p38 mitogen-activated protein kinase (p38MAPK) and recent advances in understanding its functions. We consider the potential of the pharmacological targeting of p38MAPK pathways to treat ARDS

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Cadernos Espinosanos (E-Journal)